__author__ = 'Alice Jacques <alice.jacques@noirlab.edu>, Astro Data Lab Team <datalab@noirlab.edu>'
__version__ = '20211227' #yyyymmdd
__datasets__ = ['ls_dr9','sdss_dr16','gaia_dr2','des_dr1','smash_dr2','unwise_dr1','allwise','nsc_dr2']
__keywords__ = ['crossmatch','image cutout']
by Alice Jacques and the Astro Data Lab Team
For examples using the pre-crossmatched tables hosted at Astro Data Lab, see our Examples using the pre-crossmatched tables notebook.
The crossmatch tables at Astro Data Lab are named as follows:
schema1.xNpN__table1__schema2__table2
where the N in NpN encode the numerical value of the crossmatching radius (since dots '.' are not allowed in table names).
Example:
ls_dr9.x1p5__tractor__nsc_dr2__object
is a crossmatch table (indicated by the leading x after the dot '.'), located in the ls_dr9 schema, and it crossmatches the ls_dr9.tractor table with the nsc_dr2.object table (which lives in the nsc_dr2 schema) within a 1.5 arcseconds radius ('1p5') .
This is admittedly long, but clean, consistent, and most importantly, parsable. The use of double-underscores '__' is to distinguish from single underscores often used in schema and table names.
All pre-crossmatched tables have only these seven columns: id1, ra1, dec1, id2, ra2, dec2, distance. Column descriptions in the pre-crossmatched table contain the original column names in parentheses (this makes them parsable).
For example:
ls_dr9.x1p5__tractor__nsc_dr2__object
| Column | Description | Datatype |
|---|---|---|
| id1 | ID in left/first table (ls_id) | BIGINT |
| ra1 | Right ascension in left/first table (ra) | DOUBLE |
| dec1 | Declination in left/first table (dec) | DOUBLE |
| id2 | ID in right/second table (id) | VARCHAR |
| ra2 | Right ascension in right/second table (ra) | DOUBLE |
| dec2 | Declination in right/second table (dec) | DOUBLE |
| distance | Distance between ra1,dec1 and ra2,dec2 (arcsec) | DOUBLE |
allwise.x1p5__source__des_dr2__main and des_dr2.x1p5__main__allwise__source.The list of available crossmatch tables can be viewed in our table schema browser under their respective schema.
If you use this notebook for your published science, please acknowledge the following:
Data Lab concept paper: Fitzpatrick et al., "The NOAO Data Laboratory: a conceptual overview", SPIE, 9149, 2014, http://dx.doi.org/10.1117/12.2057445
Data Lab disclaimer: https://datalab.noirlab.edu/disclaimers.php
# std lib
from getpass import getpass
# 3rd party
from astropy.utils.data import download_file #import file from URL
from matplotlib.ticker import NullFormatter
import pylab as plt
import matplotlib
%matplotlib inline
# Data Lab
from dl import authClient as ac, queryClient as qc, storeClient as sc
from dl.helpers.utils import convert # converts table to Pandas dataframe object
Much of the functionality of Data Lab can be accessed without explicitly logging in (the service then uses an anonymous login). But some capacities, for instance saving the results of your queries to your virtual storage space, require a login (i.e. you will need a registered user account).
If you need to log in to Data Lab, issue this command, and respond according to the instructions:
#ac.login(input("Enter user name: (+ENTER) "),getpass("Enter password: (+ENTER) "))
ac.whoAmI()
'demo00'
We can use Data Lab's Query Client to access the pre-crossmatched tables hosted by Data Lab. First let's get a total count of the number of objects (nrows) in SDSS DR16 that are also in LS DR9:
%%time
query="SELECT nrows FROM tbl_stat WHERE schema='sdss_dr16' and tbl_name='x1p5__specobj__ls_dr9__tractor'"
# Call query manager
response = qc.query(sql=query)
print(response)
nrows 4554541 CPU times: user 30.3 ms, sys: 1.11 ms, total: 31.4 ms Wall time: 96.9 ms
Now let's print just the first 100 rows:
%%time
query = "SELECT * FROM sdss_dr16.x1p5__specobj__ls_dr9__tractor LIMIT 100"
result = qc.query(sql=query,fmt='pandas')
result
CPU times: user 28.8 ms, sys: 4.35 ms, total: 33.2 ms Wall time: 104 ms
| id1 | ra1 | dec1 | id2 | ra2 | dec2 | distance | |
|---|---|---|---|---|---|---|---|
| 0 | 3384465917919389696 | 287.22826 | 48.064735 | 9907737095837650 | 287.228165 | 48.064735 | 0.227509 |
| 1 | 3384466192797296640 | 287.44889 | 48.229698 | 9907737159009377 | 287.448870 | 48.229697 | 0.049226 |
| 2 | 3384462344506599424 | 287.38750 | 48.168965 | 9907737158950166 | 287.387517 | 48.168933 | 0.121452 |
| 3 | 3384463718896134144 | 287.69779 | 48.382804 | 9907737221862367 | 287.697861 | 48.382752 | 0.252610 |
| 4 | 3384465093285668864 | 287.54718 | 48.407654 | 9907737221859120 | 287.547174 | 48.407548 | 0.381269 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | 3384469766210086912 | 287.60106 | 48.844872 | 9907737284317775 | 287.601034 | 48.844932 | 0.223774 |
| 96 | 3384471690355435520 | 287.70990 | 48.888661 | 9907737346511898 | 287.709937 | 48.888637 | 0.123855 |
| 97 | 3384469491332179968 | 287.66389 | 48.944252 | 9907737346510986 | 287.663800 | 48.944491 | 0.887495 |
| 98 | 3384480486448457728 | 287.22115 | 48.827232 | 9907737284252188 | 287.221105 | 48.827183 | 0.206944 |
| 99 | 3384477737669388288 | 287.29420 | 48.927487 | 9907737346445470 | 287.294186 | 48.927487 | 0.032387 |
100 rows × 7 columns
Here we compare the speed of using the q3c_join() function to crossmatch directly in a query (query1) versus using a pre-crossmatched table and a JOIN statement (query2). We retrieve the same specified columns and the same random rows for the two queries. We will see that query2 retrieves results faster than query1 can retrieve results.
%%time
query1 = """
SELECT
a.source_id AS id1, a.ra AS ra1, a.dec AS dec1,
gg.specobjid AS id2, gg.ra AS ra2, gg.dec AS dec2,
(q3c_dist(a.ra, a.dec, gg.ra, gg.dec)*3600) AS distance
FROM
allwise.source AS a
INNER JOIN LATERAL (
SELECT
s.specobjid, s.ra, s.dec
FROM
sdss_dr16.specobj AS s
WHERE
q3c_join(a.ra, a.dec, s.ra, s.dec, 1.5/3600.0)
ORDER BY
random()
ASC LIMIT 1
) AS gg ON true
WHERE
a.random_id BETWEEN 10 and 10.15
"""
df1 = qc.query(sql=query1,fmt='pandas')
df1 = df1.sort_values('id1')
CPU times: user 53 ms, sys: 11.7 ms, total: 64.7 ms Wall time: 13.2 s
%%time
query2 = """
SELECT
X.id1, X.id2, X.ra1, X.dec1, X.ra2, X.dec2, X.distance
FROM
allwise.x1p5__source__sdss_dr16__specobj AS X
JOIN
allwise.source AS a ON X.id1 = a.source_id
WHERE
a.random_id BETWEEN 10 and 10.15
"""
df2 = qc.query(sql=query2,fmt='pandas')
df2 = df2.sort_values('id1')
CPU times: user 48.3 ms, sys: 13.7 ms, total: 62 ms Wall time: 2.64 s
Here we plot the two distance histograms to demonstrate that the results obtained by both the JOIN and the q3c crossmatch queries are identical.
plt.hist(df1['distance'],histtype='step',color='b',lw=3,ls='-',alpha=0.5,label='Crossmatching manually')
plt.hist(df2['distance'],histtype='step',color='r',lw=4,ls=':',alpha=0.5,label='Using pre-crossmatched table')
plt.xlabel('angular distance (arcsec)')
plt.ylabel('number of sources')
plt.legend(loc='upper right',frameon=False)
plt.show()
We again select objects from two catalogs and retrieve the same specified columns and the same random rows for two queries. query3 uses a pre-crossmatched table and a JOIN query while query4 crossmatches directly in the query using the q3c_join() function. We will see that query3 retrieves results faster than query4 can retrieve results.
%%time
query3 = """
SELECT
X.id1, X.id2, X.ra1, X.dec1, X.ra2, X.dec2, X.distance
FROM
unwise_dr1.x1p5__object__sdss_dr16__specobj AS X
JOIN
unwise_dr1.object AS u ON X.id1 = u.unwise_objid
WHERE
u.random_id BETWEEN 10 and 10.1
"""
df3 = qc.query(sql=query3,fmt='pandas',timeout=600)
df3 = df3.sort_values('id1')
CPU times: user 39.8 ms, sys: 14.7 ms, total: 54.4 ms Wall time: 3.95 s
%%time
query4 = """
SELECT
u.unwise_objid AS id1, u.ra AS ra1 ,u.dec AS dec1,
ss.specobjid AS id2, ss.ra AS ra2, ss.dec AS dec2,
(q3c_dist(u.ra, u.dec, ss.ra, ss.dec)*3600.0) AS distance
FROM
unwise_dr1.object AS u
INNER JOIN LATERAL (
SELECT
s.specobjid, s.ra, s.dec
FROM
sdss_dr16.specobj AS s
WHERE
q3c_join(u.ra, u.dec, s.ra, s.dec, 1.5/3600.0)
ORDER BY
random()
ASC LIMIT 1
) as ss ON true
WHERE
u.random_id BETWEEN 10 and 10.1
"""
df4 = qc.query(sql=query4,fmt='pandas',timeout=600)
df4 = df4.sort_values('id1')
CPU times: user 53.3 ms, sys: 9.12 ms, total: 62.4 ms Wall time: 25.5 s
Here we plot the two distance histograms to demonstrate that the results obtained by both the JOIN and the q3c crossmatch queries are identical.
plt.hist(df4['distance'],histtype='step',color='b',lw=3,ls='-',alpha=0.5,label='Crossmatching manually')
plt.hist(df3['distance'],histtype='step',color='r',lw=4,ls=':',alpha=0.5,label='Using pre-crossmatched table')
plt.xlabel('angular distance (arcsec)')
plt.ylabel('number of sources')
plt.legend(loc='upper right',frameon=False)
plt.show()
A clear benefit of pre-crossmatched tables is that they contain the positions of the same objects in two datasets. We can use this to e.g. fetch images of an object from both surveys.
Here we will compare two images of the same object from two different catalogs, unWISE DR1 and LS DR9.
def make_cutout_comparison_table(ra_in1, dec_in1, layer1, layer2, pixscale, ra_in2=None, dec_in2=None):
"""
Obtain color JPEG images from Legacy Survey team cutout tool at NERSC
"""
img1 = []
img2 = []
for i in range(len(ra_in1)):
cutout_url1 = "https://www.legacysurvey.org/viewer/cutout.jpg?ra=%g&dec=%g&layer=%s&pixscale=%s" % (ra_in1[i],dec_in1[i],layer1,pixscale)
img = plt.imread(download_file(cutout_url1,cache=True,show_progress=False,timeout=120))
img1.append(img)
cutout_url2 = "https://www.legacysurvey.org/viewer/cutout.jpg?ra=%g&dec=%g&layer=%s&pixscale=%s" % (ra_in2[i],dec_in2[i],layer2,pixscale)
img = plt.imread(download_file(cutout_url2,cache=True,show_progress=False,timeout=120))
img2.append(img)
return img1,img2
def plot_cutouts(img1,img2,cat1,cat2):
"""
Plot images in two rows with 5 images in each row
"""
fig = plt.figure(figsize=(21,7))
for i in range(len(img1)):
ax = fig.add_subplot(2,6,i+1)
ax.imshow(img1[i])
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
ax.tick_params(axis='both',which='both',length=0)
ax.text(0.02,0.93,'ra=%.5f'%list_ra1[i],transform=ax.transAxes,fontsize=12,color='white')
ax.text(0.02,0.85,'dec=%.5f'%list_dec1[i],transform=ax.transAxes,fontsize=12,color='white')
ax.text(0.02,0.77,cat1,transform=ax.transAxes,fontsize=12,color='white')
ax = fig.add_subplot(2,6,i+7)
ax.imshow(img2[i])
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
ax.tick_params(axis='both',which='both',length=0)
ax.text(0.02,0.93,'ra=%.5f'%list_ra2[i],transform=ax.transAxes,fontsize=12,color='white')
ax.text(0.02,0.85,'dec=%.5f'%list_dec2[i],transform=ax.transAxes,fontsize=12,color='white')
ax.text(0.02,0.77,cat2,transform=ax.transAxes,fontsize=12,color='white')
plt.subplots_adjust(wspace=0.02, hspace=0.03)
... then save them as arrays and set the captions, layers, and pixscale. Finally we plot the cutout images.
%%time
q = """
SELECT
ra1, dec1, ra2, dec2
FROM
unwise_dr1.x1p5__object__ls_dr9__tractor
WHERE
ra1>300 and dec1>33
ORDER BY
random()
LIMIT 5
"""
r = qc.query(sql=q,fmt='pandas')
list_ra1=r['ra1'].values # ".values" convert to numpy array
list_dec1=r['dec1'].values
list_ra2=r['ra2'].values
list_dec2=r['dec2'].values
cat1='unWISE'
cat2='ls dr9'
layer1='unwise-neo6'
layer2='ls-dr9'
pixscale='0.3'
img1,img2 = make_cutout_comparison_table(list_ra1,list_dec1,layer1,layer2,
pixscale,list_ra2,list_dec2)
plot_cutouts(img1,img2,cat1,cat2)
CPU times: user 400 ms, sys: 31.9 ms, total: 431 ms Wall time: 58.3 s
Here we will compare two images of the same object from two different catalogs, SDSS and DES DR1.
... then save them as arrays and set the captions, layers, and pixscale. Finally we plot the cutout images.
%%time
q = """
SELECT
ra1, dec1, ra2, dec2
FROM
sdss_dr16.x1p5__specobj__des_dr1__main
ORDER BY
random()
LIMIT 5
"""
r = qc.query(sql=q,fmt='pandas')
list_ra1=r['ra1'].values # ".values" convert to numpy array
list_dec1=r['dec1'].values
list_ra2=r['ra2'].values
list_dec2=r['dec2'].values
cat1='sdss dr16'
cat2='des dr1'
layer1='sdss'
layer2='des-dr1'
pixscale='0.25'
img1,img2 = make_cutout_comparison_table(list_ra1,list_dec1,layer1,layer2,
pixscale,list_ra2,list_dec2)
plot_cutouts(img1,img2,cat1,cat2)
CPU times: user 388 ms, sys: 26.7 ms, total: 415 ms Wall time: 9.89 s
We compare two images of the same galaxy from two different catalogs, SDSS and DES DR1. We use a list of identified galaxies (RA/Dec positions) to compare the difference in observable features and quality between the two catalogs.
First we import the CSV file of identified galaxies (RA/Dec positions) into MyDB:
qc.mydb_import('gals','./gals.csv',drop=True)
'OK'
We write the query to select the first five RA/Dec positions from our table. We then save them as arrays and set the captions, layers, and pixscale. Finally we plot the cutout images.
%%time
qg = "SELECT ra, dec FROM mydb://gals LIMIT 5"
rg = qc.query(sql=qg)
rp = convert(rg)
list_ra1=rp['ra'].values
list_dec1=rp['dec'].values
list_ra2=rp['ra'].values
list_dec2=rp['dec'].values
cat1='sdss dr16'
cat2='des dr1'
layer1='sdss'
layer2='des-dr1'
pixscale='0.5'
img1,img2 = make_cutout_comparison_table(list_ra1,list_dec1,layer1,layer2,
pixscale,ra_in2=list_ra1,dec_in2=list_dec1)
plot_cutouts(img1,img2,cat1,cat2)
CPU times: user 331 ms, sys: 11.7 ms, total: 343 ms Wall time: 593 ms
We write the next query to select the next five RA/Dec positions from our table. We then save them as arrays and set the captions, layers, and pixscale. Finally we plot the cutout images.
%%time
qg = "SELECT ra, dec FROM mydb://gals LIMIT 5 OFFSET 5"
rg = qc.query(sql=qg)
rp = convert(rg)
list_ra1=rp['ra'].values
list_dec1=rp['dec'].values
list_ra2=rp['ra'].values
list_dec2=rp['dec'].values
img1,img2 = make_cutout_comparison_table(list_ra1,list_dec1,layer1,layer2,
pixscale,ra_in2=list_ra1,dec_in2=list_dec1)
plot_cutouts(img1,img2,cat1,cat2)
CPU times: user 326 ms, sys: 13.5 ms, total: 339 ms Wall time: 540 ms
We write the next query to select the last five RA/Dec positions from our table. We then save them as arrays and set the captions, layers, and pixscale. Finally we plot the cutout images.
%%time
qg = "SELECT ra, dec FROM mydb://gals LIMIT 5 OFFSET 10"
rg = qc.query(sql=qg)
rp = convert(rg)
list_ra1=rp['ra'].values
list_dec1=rp['dec'].values
list_ra2=rp['ra'].values
list_dec2=rp['dec'].values
img1,img2 = make_cutout_comparison_table(list_ra1,list_dec1,layer1,layer2,
pixscale,ra_in2=list_ra1,dec_in2=list_dec1)
plot_cutouts(img1,img2,cat1,cat2)
CPU times: user 495 ms, sys: 33 ms, total: 528 ms Wall time: 755 ms
Legacy Survey Sky Browser: https://www.legacysurvey.org/viewer#NGC%203098